1 00:00:09,001 --> 00:00:14,000 Good afternoon, everybody. It's lovely to see you all here today joining us for 2 00:00:14,000 --> 00:00:18,001 the second session in our series with the Internet Archive, D-Web, and Library 3 00:00:18,001 --> 00:00:23,000 Futures. The entire series is titled Imagining a Better Online World, Exploring 4 00:00:23,000 --> 00:00:26,001 the Decentralized Web. And today we'll be talking about using the decentralized 5 00:00:26,001 --> 00:00:31,000 storage to keep your materials safe. My name is Davis Erin Anderson. I'm 6 00:00:31,000 --> 00:00:35,000 assistant director for programs and partnerships at Metro. Please say hello in 7 00:00:35,000 --> 00:00:39,000 chat. We'd love to know who's out there, where are you from, what's your name, 8 00:00:39,001 --> 00:00:42,001 what's your interest in this topic. We'd love to know who's in the audience and 9 00:00:42,001 --> 00:00:46,001 hear from you a little bit as we get started. So Metro is a multi-type 10 00:00:46,001 --> 00:00:50,001 consortium. We serve the five boroughs of New York City and Westchester County. 11 00:00:50,001 --> 00:00:54,001 We're a service provider. We do events like this and partnership programs like 12 00:00:54,001 --> 00:00:57,001 the one you're attending today. We have a group that works on software 13 00:00:57,001 --> 00:01:01,001 development. We provide delivery services and make sure that knowledge can be 14 00:01:01,001 --> 00:01:05,001 spread equitably throughout our service area. So we really care a lot about the 15 00:01:05,001 --> 00:01:10,000 future of how information moves. And so we are pleased and honored today to 16 00:01:10,000 --> 00:01:13,001 support the work that folks are doing at Internet Archive. We wanted to hear a 17 00:01:13,001 --> 00:01:18,000 little bit more about what they envision for the future of the web. So we're 18 00:01:18,000 --> 00:01:22,000 running a six-part series. This is the second part. We'll drop a link into chat 19 00:01:22,000 --> 00:01:26,000 that lets you see where to go to register for our upcoming sessions as well as 20 00:01:26,000 --> 00:01:30,001 check back on the resources we're providing for this one in the past sessions as 21 00:01:30,001 --> 00:01:34,001 well. So if you would please drop your questions into chat and your comments as 22 00:01:34,001 --> 00:01:38,001 well. We had a really robust and active conversation going for our first session 23 00:01:38,001 --> 00:01:42,001 and we'd love to see that happen here again. We're also providing resource guides 24 00:01:42,001 --> 00:01:47,000 to go with each and every one of our six parts of the series. So please also look 25 00:01:47,000 --> 00:01:51,001 at chat for a link to the current guide and please stay tuned to your inbox. If 26 00:01:51,001 --> 00:01:55,001 you registered, you'll receive a PDF copy as well. So it's my pleasure to 27 00:01:55,001 --> 00:01:58,001 introduce you to Wendy Hanamura. Wendy is 28 00:01:58,001 --> 00:02:00,000 Director of Partnerships at Internet Archive. 29 00:02:00,001 --> 00:02:05,001 She planned the first ever decentralized web summit a few years back. In the past 30 00:02:05,001 --> 00:02:09,000 six years, she's helped to guide the global growth of the decentralized web. So 31 00:02:09,000 --> 00:02:13,001 she's really the expert on this topic. And she's here today to co-produce the six 32 00:02:13,001 --> 00:02:17,001 -part series, Imagining a Better Online World, Exploring the Decentralized Web. So 33 00:02:17,001 --> 00:02:21,000 thank you so much, Wendy. Over to you. And it's great to see you again. Thank 34 00:02:21,000 --> 00:02:24,000 you, Davis. And thanks to all of you for being here today. I'm seeing friends 35 00:02:24,000 --> 00:02:29,000 from Berlin and Argentina, many, many from New York and Florida. We're so happy 36 00:02:29,000 --> 00:02:34,000 that you can be here to learn a little bit about decentralized storage. Now in 37 00:02:34,000 --> 00:02:38,000 this webinar, we're going to be exploring with you a new set of decentralized 38 00:02:38,000 --> 00:02:44,000 technologies that may help you to preserve and provide access to your media. So 39 00:02:44,000 --> 00:02:48,001 here's the game plan for the next 60 minutes. I'm going to start by giving us an 40 00:02:48,001 --> 00:02:53,000 overview of some of the problems that decentralized storage could help to solve. 41 00:02:54,000 --> 00:02:58,000 Then I have invited a friend of mine, the founder of Starling Lab, to share with 42 00:02:58,000 --> 00:03:03,000 you how his group is working with many, many cultural institutions to keep their 43 00:03:03,000 --> 00:03:08,001 most critical and important materials safe. We also want to show you this tech in 44 00:03:08,001 --> 00:03:13,000 action. So I've invited two people to demonstrate what they've been working on. 45 00:03:13,001 --> 00:03:18,000 First, an engineer of ours from the Internet Archive is going to be showing you 46 00:03:18,000 --> 00:03:24,001 how we've been experimenting saving web archives at scale and Filecoin. And a 47 00:03:24,001 --> 00:03:29,000 senior engineer from the Storage Decentralized Storage Company is going to show 48 00:03:29,000 --> 00:03:34,001 you how we've been storing LibriVox audiobooks in decentralized storage. Now both 49 00:03:34,001 --> 00:03:39,000 of these collections, the web archives and the audiobooks, were created 50 00:03:39,000 --> 00:03:43,001 collaboratively by communities. And I think that's the real promise here, that 51 00:03:43,001 --> 00:03:48,001 you could take collaborative collections and perhaps store them and preserve them 52 00:03:48,001 --> 00:03:53,000 collaboratively as well. So let's start by thinking about some of the challenges. 53 00:03:54,000 --> 00:03:58,000 Now many of you are archivists, you're librarians, you run cultural institutions, 54 00:03:58,000 --> 00:04:04,001 so this is very familiar. Your collections are ever expanding in the physical 55 00:04:04,001 --> 00:04:07,001 world, but also in the digital realm. 56 00:04:08,000 --> 00:04:13,001 Digital objects may be even harder to store, right? How do you keep things safe, 57 00:04:13,001 --> 00:04:19,000 not only from floods and fires, but also secure from hackers? How do you make 58 00:04:19,000 --> 00:04:25,000 them accessible in a time when there are broken links and content drift? How do 59 00:04:25,000 --> 00:04:30,001 you make sure that your data is trustworthy, especially in an era when deepfakes 60 00:04:30,001 --> 00:04:36,000 are growing? Then there's the scale of digital holdings, which seem to be 61 00:04:36,000 --> 00:04:42,000 enormous. And isn't it true that weeding digital objects feels a little bit wrong 62 00:04:42,000 --> 00:04:48,001 since they're just bits? How do you weed ever-growing digital collection? And 63 00:04:48,001 --> 00:04:54,000 what about the long-term preservation, the sustainability of this collection? How 64 00:04:54,000 --> 00:04:56,001 do you do digital storage in centuries? 65 00:04:57,001 --> 00:05:04,000 And let's not forget the issue of cost. It is so hard to predict the future costs 66 00:05:04,000 --> 00:05:08,000 of decentralized storage, especially when technology is changing all the time. 67 00:05:09,000 --> 00:05:15,001 Now that takes us to this. Think of the decentralized web as a stack with every 68 00:05:15,001 --> 00:05:20,001 layer of the web stack potentially decentralized. When you take all of these 69 00:05:20,001 --> 00:05:25,001 decentralized technologies together, that's what we call the decentralized web. 70 00:05:25,001 --> 00:05:30,000 And you'll notice in this diagram that the bottom layer is decentralized storage. 71 00:05:30,000 --> 00:05:35,000 That's the layer we're going to be exploring today. Conceptually, decentralized 72 00:05:35,000 --> 00:05:41,001 storage allows you to store your data across a peer-to-peer network of servers. 73 00:05:42,000 --> 00:05:45,001 But so does Amazon Cloud, right? So what's the difference? 74 00:05:45,001 --> 00:05:50,000 I would say that the difference here is really that not only is your storage 75 00:05:50,000 --> 00:05:57,000 location distributed, but also your storage management is decentralized. That 76 00:05:57,000 --> 00:06:02,000 way you can't take out just one central control entity like Amazon and have the 77 00:06:02,000 --> 00:06:05,001 entire system go down. So what is the promise? 78 00:06:06,000 --> 00:06:10,000 What does decentralized storage offer? Well, first there's the concept of 79 00:06:10,000 --> 00:06:14,000 resiliency. Now, we're very familiar with that in the library world. There's 80 00:06:14,000 --> 00:06:19,000 locks, lots of copies keep things safe. So we know that if you distribute copies 81 00:06:19,000 --> 00:06:23,001 across different geographic lines, geopolitical lines, it's going to be safer. 82 00:06:24,001 --> 00:06:28,001 Then there's the concept of persistence. Now, this is something that a lot of 83 00:06:28,001 --> 00:06:32,001 people get wrong when they think about the decentralized web. Just because you 84 00:06:32,001 --> 00:06:38,001 cut up a file and put pieces of it in different servers does not mean that those 85 00:06:38,001 --> 00:06:44,000 servers are guaranteed to keep your files forever. Now, persistence would mean 86 00:06:44,000 --> 00:06:48,001 that you'd have to have a guarantee somehow built in that the people who hold 87 00:06:48,001 --> 00:06:54,000 your copies will hold them forever or for a long time. So how do you ensure 88 00:06:54,000 --> 00:06:58,001 persistence? Well, in truth, I don't think we're really sure about that. But 89 00:06:58,001 --> 00:07:04,000 organizations like Filecoin and Storage are using a combination of incentives and 90 00:07:04,000 --> 00:07:11,000 shared protocols and contracts to try to ensure persistence. Next, I think 91 00:07:11,000 --> 00:07:15,001 this this step, self-certification is the most important attribute 92 00:07:15,001 --> 00:07:17,000 of decentralized storage. 93 00:07:18,001 --> 00:07:25,001 You know, here every item is assigned a unique immutable hash, a persistent ID. 94 00:07:26,000 --> 00:07:31,000 And you use this ID to find your things wherever they are and to copy how many 95 00:07:31,000 --> 00:07:38,000 people have to check how many people have copies of them. So this is something 96 00:07:38,000 --> 00:07:40,000 we call content addressing. 97 00:07:40,001 --> 00:07:44,001 And in Web 2.0, you find things based on where they're located, right? You have a 98 00:07:44,001 --> 00:07:49,000 URL that takes you to a place on a server. Well, in Web 3.0 or the decentralized 99 00:07:49,000 --> 00:07:55,001 web, the ID remains with the content itself. And if the content changes, so does 100 00:07:55,001 --> 00:08:01,000 the hash. So anytime something is altered, you get a new hash. And ostensibly, 101 00:08:01,001 --> 00:08:05,001 the self-certification is what allows you to ensure the provenance and 102 00:08:05,001 --> 00:08:11,001 authenticate an item. Finally, there is the goal of interoperability. 103 00:08:12,000 --> 00:08:16,000 I think it's pretty true that right now we have a lot of 104 00:08:16,000 --> 00:08:18,000 silos where our materials live. 105 00:08:18,001 --> 00:08:21,001 And when you want to work collaboratively on a shared data 106 00:08:21,001 --> 00:08:23,000 set, that can be very problematic. 107 00:08:23,001 --> 00:08:29,000 Now in the utopian version of decentralized storage, you can have collaborative, 108 00:08:29,000 --> 00:08:35,000 authenticated, co-hosted collections. And these collections would be less prone 109 00:08:35,000 --> 00:08:39,001 to censorship because you can't block just one URL and block the entire 110 00:08:39,001 --> 00:08:45,000 collection. They're also perhaps harder to hack because there's not one single 111 00:08:45,000 --> 00:08:49,000 honeypot to go after. They may be easier to share. 112 00:08:49,001 --> 00:08:53,001 Taken together, resiliency, persistence, self-certification, and 113 00:08:53,001 --> 00:08:58,001 interoperability, that is the promise of decentralized storage. But it is still 114 00:08:58,001 --> 00:09:03,000 early days. So whether or not we can deliver on those things is something we're 115 00:09:03,000 --> 00:09:08,000 testing. Now it is my deep pleasure to bring on Jonathan Doten. He's the founder 116 00:09:08,000 --> 00:09:14,000 of Starling Lab, which is the first major research laboratory devoted to Web 3.0 117 00:09:14,000 --> 00:09:19,000 technologies. It's affiliated with Stanford and USC. And I know that Starling has 118 00:09:19,000 --> 00:09:23,001 been working for quite a while with the Shoah Foundation to make sure that 119 00:09:23,001 --> 00:09:29,000 Holocaust testimony videos are kept safe and persistent. But here's a fun fact. I 120 00:09:29,000 --> 00:09:34,001 first met Jonathan Doten back in 2018 when he was the consultant for HBO Silicon 121 00:09:34,001 --> 00:09:39,001 Valley. And it was Jonathan Doten who convinced the showrunners to introduce a 122 00:09:39,001 --> 00:09:43,001 storyline about a new internet, a decentralized internet. 123 00:09:43,001 --> 00:09:48,000 And that's how he came to be involved with us at the D-Web community. So welcome, 124 00:09:48,000 --> 00:09:52,000 Jonathan Doten, founder of Starling Lab. Thanks so much, Wendy, for having me. 125 00:09:52,001 --> 00:09:56,000 And to the entire community that's assembled here, I can't think of a more 126 00:09:56,000 --> 00:09:59,001 appropriate group of folks to be speaking to about decentralized storage because 127 00:09:59,001 --> 00:10:06,000 certainly the power of archiving institutions and libraries and providing a new 128 00:10:06,000 --> 00:10:12,001 layer of trust for communities in preservation is unique. And 129 00:10:12,001 --> 00:10:16,001 I'm really excited to help bring you into the fold to help answer any questions 130 00:10:16,001 --> 00:10:20,001 and potentially even inspire you on the possibilities. At the Starling Lab, we've 131 00:10:20,001 --> 00:10:24,001 been working on what we call a framework for data integrity that allows you end 132 00:10:24,001 --> 00:10:30,001 to end to think about how you capture, store, and verify information. And the 133 00:10:30,001 --> 00:10:35,001 page that we really are working from here is one that was written many years ago. 134 00:10:36,001 --> 00:10:41,001 So I want to start with a little bit of context today, share with you a prototype 135 00:10:41,001 --> 00:10:45,001 of some of the early work that we've done, and then get into some of the 136 00:10:45,001 --> 00:10:47,001 learnings and how they might apply over to 137 00:10:47,001 --> 00:10:49,000 you and some of your archival use cases. 138 00:10:50,001 --> 00:10:55,000 So to begin, Wendy's talked a little bit about the goals of decentralization, but 139 00:10:55,000 --> 00:10:59,001 I want to start even a little bit more upstream from there and just get into a 140 00:10:59,001 --> 00:11:04,001 very simple but also, I think, poignant understanding of how decentralization 141 00:11:04,001 --> 00:11:10,001 works. In the prior view of communication systems, when let's say AT&T as an 142 00:11:10,001 --> 00:11:16,000 example was a dominant form of guaranteeing communications infrastructure, you 143 00:11:16,000 --> 00:11:20,001 had to rely on AT&T as a centralized node, right? Basically, all the data went 144 00:11:20,001 --> 00:11:25,001 into AT&T and then went out of AT&T in order to be passed along in the case of 145 00:11:25,001 --> 00:11:30,001 something like voice communications. The distributed web was something that was 146 00:11:30,001 --> 00:11:35,000 not actually done, let's say, in the 90s when the web took off. It actually was a 147 00:11:35,000 --> 00:11:40,000 part of the original architecture of the internet. And when Paul Barron over at 148 00:11:40,000 --> 00:11:44,001 the RAN is to set forth how a digital network of this kind might work, he 149 00:11:44,001 --> 00:11:49,000 explained that what was critical was to establish nodes that were interoperable. 150 00:11:49,000 --> 00:11:53,000 So that meant that if any of those nodes went down, they could be replaced and 151 00:11:53,000 --> 00:11:56,000 functioned by the others, that there was neutrality, that the network itself 152 00:11:56,000 --> 00:12:00,000 didn't discriminate, but simply passed along the information. And that meant that 153 00:12:00,000 --> 00:12:05,000 you could allow the end user to hold all of the intelligence and the rich 154 00:12:05,000 --> 00:12:08,001 applications that might exist on this type of network. 155 00:12:09,000 --> 00:12:14,001 And so this concept was really baked into a larger philosophical framework that 156 00:12:14,001 --> 00:12:19,000 was put into play by Tug Engelbar and the team over at the Augment Intelligence 157 00:12:19,000 --> 00:12:23,000 Framework. And what it deposited is that for a decentralized network of knowledge 158 00:12:23,000 --> 00:12:30,000 to exist, you needed to have at the end computers that could be used by the 159 00:12:30,000 --> 00:12:34,000 average user. So a personal computer and the concept of the internet were 160 00:12:34,000 --> 00:12:37,000 actually birthed out of the same framework. That's not actually commonly 161 00:12:37,000 --> 00:12:42,001 understood, but it makes sense on many levels. And as you look at the history of 162 00:12:42,001 --> 00:12:47,000 the web as it took off in the early 90s, it's no accident that the first HTTP 163 00:12:47,000 --> 00:12:49,000 server was actually a personal computer. 164 00:12:49,001 --> 00:12:54,000 And funny enough, this is Tim Berners-Lee's machine over at CERN. You can see 165 00:12:54,000 --> 00:12:57,001 there's a sticker that said, to clarify that this is not just a personal 166 00:12:57,001 --> 00:13:01,000 computer, but it's actually a server. And he explained to people that they 167 00:13:01,000 --> 00:13:08,000 shouldn't turn off this machine because it was basically hosting information. So 168 00:13:08,000 --> 00:13:12,000 from that, those early promising visions around decentralized technologies and 169 00:13:12,000 --> 00:13:16,001 how one might want to build a global internet infrastructure, there's a growing 170 00:13:16,001 --> 00:13:22,000 realization that now, as the internet has taken off and working at scale with so 171 00:13:22,000 --> 00:13:26,001 many parts of our lives touching it, that something really is a myth. That as the 172 00:13:26,001 --> 00:13:30,000 internet has become more centralized and dominated by corporations that have 173 00:13:30,000 --> 00:13:35,001 often monopolistic practices, that there's been tremendous issues with 174 00:13:35,001 --> 00:13:39,000 establishing trust in this type of system, and that we might want to return to 175 00:13:39,000 --> 00:13:42,001 decentralization to restore trust and to 176 00:13:42,001 --> 00:13:44,001 restore a sense of fairness within the internet. 177 00:13:44,001 --> 00:13:48,001 So that's where a lot of these ideas are coming from, to be clear. This is 178 00:13:48,001 --> 00:13:53,001 actually how the original internet was designed and meant to be. The 179 00:13:53,001 --> 00:13:58,001 decentralized web summit in 2018, which is how I got really into the fold, was a 180 00:13:58,001 --> 00:14:02,001 program that was hosted by Wendy and Bruce over at the Internet Archive. And it 181 00:14:02,001 --> 00:14:06,000 was really transformative in bringing a number of different individuals together 182 00:14:06,000 --> 00:14:12,000 to think about these issues. And it's important to mention this type of, the 183 00:14:12,000 --> 00:14:16,000 importance of this event, it was really that it was a cultural event, as well as 184 00:14:16,000 --> 00:14:20,000 a technical event, in which people were thinking about shared values that were 185 00:14:20,000 --> 00:14:24,000 distinctly different from things that you may have heard of around blockchains or 186 00:14:24,000 --> 00:14:28,001 cryptocurrencies. This was really a group of people that were concerned with how 187 00:14:28,001 --> 00:14:31,001 do you guarantee access to knowledge and how can you use decentralized 188 00:14:31,001 --> 00:14:36,001 technologies for things like preservation, which is our topic today. So the 189 00:14:36,001 --> 00:14:41,001 Starling Lab came out of really that rich tradition. And in working with the USC 190 00:14:41,001 --> 00:14:45,001 Shoah Foundation and Stanford's Department of Electrical Engineering, we've found 191 00:14:45,001 --> 00:14:49,000 this incredible opportunity to bring together experts to think about how we can 192 00:14:49,000 --> 00:14:54,001 deploy decentralized technologies to advance human rights. So we work with a 193 00:14:54,001 --> 00:15:01,000 number of different industry partners. And really the founding work and research 194 00:15:01,000 --> 00:15:04,000 that we did was with the USC Shoah Foundation's Visual History Archive. 195 00:15:04,001 --> 00:15:08,000 As Wendy mentioned, this is an archive that deals with the testimony 196 00:15:08,000 --> 00:15:09,001 of the survivors of genocide. 197 00:15:10,000 --> 00:15:15,000 It started 30 years ago by cataloging the stories of survivors of the Holocaust, 198 00:15:15,001 --> 00:15:19,001 but it expanded and now they're working on, I believe they're on their 14th 199 00:15:19,001 --> 00:15:21,001 genocide collection as of last week. 200 00:15:23,001 --> 00:15:27,001 The, and sadly, of course, that number continues to increase. There are over 55 201 00:15:27,001 --> 00:15:31,001 ,000 survivors' testimonies on average. It's about two and a half hours and 202 00:15:31,001 --> 00:15:37,000 several gigabytes for every testimony. So it's a massive four-bedabyte 203 00:15:37,000 --> 00:15:42,000 collection. And currently it sits in three different data centers that are all 204 00:15:42,000 --> 00:15:47,000 state-of-the-art tape-based archival systems. But working with them, we really 205 00:15:47,000 --> 00:15:51,000 reimagined a continuum of preservation that goes beyond just their data centers 206 00:15:51,000 --> 00:15:57,000 that are maintained by the USC Shoah Foundation. Bravely their CTO, Sam Gussman, 207 00:15:57,000 --> 00:16:02,000 has been working with us on figuring out a way to take the entire four petabytes 208 00:16:02,000 --> 00:16:07,000 of the Shoah Foundation's archive and put it onto the distributed web. And then 209 00:16:07,000 --> 00:16:10,001 in addition, I should mention that he's also looking at longer-term media 210 00:16:10,001 --> 00:16:14,001 storage, like storage on silica and DNA, et cetera. So they're quite innovative 211 00:16:14,001 --> 00:16:19,000 and progressive in thinking about preservation. The type of content that we've 212 00:16:19,000 --> 00:16:24,000 been working on is in looking at genocide testimonies. We've expanded our 213 00:16:24,000 --> 00:16:27,001 testimony collections with them to understand how the whole life cycle of 214 00:16:27,001 --> 00:16:34,001 testimony is collected and preserved and indexed. We've gone to Iraq, Los 215 00:16:34,001 --> 00:16:39,000 Angeles, the Amazon rainforest. We've been working in Syria on preserving 216 00:16:39,000 --> 00:16:44,000 testimony that can be potentially not only useful for humanitarian causes, but 217 00:16:44,000 --> 00:16:48,001 also for accountability as well. So we look at preservation for those purposes. 218 00:16:49,001 --> 00:16:52,001 And finally, we've been working with news organizations like Reuters to look at 219 00:16:52,001 --> 00:16:57,000 their archives. And most recently, we finished a project with them last year 220 00:16:57,000 --> 00:17:02,000 called the 78 Days, which catalogued each of the 78 days between the election and 221 00:17:02,000 --> 00:17:08,001 the inauguration. That included January 6th as well. What we found is that really 222 00:17:08,001 --> 00:17:12,000 what we're creating is a set of solutions is not just centered around 223 00:17:12,000 --> 00:17:17,000 preservation, but it's also around restoring trust. And so what I want to show 224 00:17:17,000 --> 00:17:21,000 you today is how we think about how that might work end to end. And really, it 225 00:17:21,000 --> 00:17:27,000 begins by following the natural life cycle of how you go and generate data, which 226 00:17:27,000 --> 00:17:30,001 would start with capturing. And then you move to storage. And then finally, 227 00:17:30,001 --> 00:17:36,000 verification. And it's with each of those steps that are critical to ensuring 228 00:17:36,000 --> 00:17:40,000 that you have a data set that could be trusted. And what I want to show you today 229 00:17:40,000 --> 00:17:44,000 is how decentralized systems can actually guarantee trust at each of these three 230 00:17:44,000 --> 00:17:49,001 stages. So let's begin with capturing on something like, let's say, a mobile 231 00:17:49,001 --> 00:17:54,000 phone. In our case, what we've done is we've taken not only the phone's ability 232 00:17:54,000 --> 00:17:58,000 to take a photo, but also looked at all the other sensor information that exists 233 00:17:58,000 --> 00:18:03,001 on the phone, like GPS for location, network information to establish a relative 234 00:18:03,001 --> 00:18:08,001 location, the gyroscope to understand the relative position of the camera, and 235 00:18:08,001 --> 00:18:11,001 even things like time and date, right? These are all critical pieces of metadata 236 00:18:11,001 --> 00:18:17,000 that the phone is able to generate. What we do is we take that metadata and we 237 00:18:17,000 --> 00:18:20,001 pair it with the image so that every time that you take a photo, you now have the 238 00:18:20,001 --> 00:18:27,000 payload, not only of the image pixels, but also of this metadata. Now, through 239 00:18:27,000 --> 00:18:31,001 our process of working with HTC, we've been able to take this payload and do 240 00:18:31,001 --> 00:18:35,000 something really special with it on the device, which is that we first of all 241 00:18:35,000 --> 00:18:39,000 create a hash of it on the device itself. And then we sign that hash with a 242 00:18:39,000 --> 00:18:43,000 cryptographic key that is guaranteed by firmware, which is on specialized 243 00:18:43,000 --> 00:18:48,000 hardware with inside the phones. So what that means really simply is that now you 244 00:18:48,000 --> 00:18:53,001 have a unique fingerprint of both the image and the metadata, and we've signed 245 00:18:53,001 --> 00:18:59,000 that so that we know that that fingerprint is secure. So with that payload of 246 00:18:59,000 --> 00:19:03,000 preservation information and all the metadata, we now take it and we put it on 247 00:19:03,000 --> 00:19:08,001 the decentralized web. So that step begins by first creating a CID. 248 00:19:08,001 --> 00:19:14,001 Which is a unique identifier for that payload. And then we spread it out across 249 00:19:14,001 --> 00:19:19,000 the decentralized web, basically splitting up into different pieces. And there we 250 00:19:19,000 --> 00:19:23,001 can store it onto different types of nodes. So you could imagine academic 251 00:19:23,001 --> 00:19:29,000 institutions, nonprofits, enterprise cloud, even small devices like the 252 00:19:29,000 --> 00:19:36,000 raspberry pi or a personal computer, even a phone. All of these different 253 00:19:36,000 --> 00:19:39,000 nodes are, in our minds, appropriate for storage because we want to diversify 254 00:19:39,000 --> 00:19:43,001 storage. And that's really critical to our framework. In addition to that, we use 255 00:19:43,001 --> 00:19:48,000 cryptography and advanced proves a space time like the kind of Falcoin has an 256 00:19:48,000 --> 00:19:52,000 example to ensure that as you spread information far and wide, you're also 257 00:19:52,000 --> 00:19:57,000 ensuring that its integrity is kept. And that if any of those nodes which takes 258 00:19:57,000 --> 00:20:01,001 the data, manipulates the data, we now have a way of proving that in fact, that 259 00:20:01,001 --> 00:20:06,001 manipulation has occurred. Paradoxically, what this means is that as you spread 260 00:20:06,001 --> 00:20:10,001 information farther and wider, not only are you able to preserve the information 261 00:20:10,001 --> 00:20:15,000 better, but you're actually able to create a seal around the information. And 262 00:20:15,000 --> 00:20:19,000 with more and more nodes joining that network, the harder it is to break that 263 00:20:19,000 --> 00:20:24,001 seal. So that's our preservation story. But as you all know, in working in the 264 00:20:24,001 --> 00:20:28,001 archival space, it doesn't end there. Just because you have a record of something 265 00:20:28,001 --> 00:20:32,001 that you prove has not been manipulated, still the contents of the objects matter 266 00:20:32,001 --> 00:20:37,000 to be, they need to be examined and they need to be indexed. And so the expert 267 00:20:37,000 --> 00:20:41,001 certification of the content of information is something that is normally done 268 00:20:41,001 --> 00:20:43,001 through an archival process of indexing and verification. 269 00:20:44,000 --> 00:20:49,000 So we take those adaptations, and those two, we also put those on decentralized 270 00:20:49,000 --> 00:20:53,001 systems so that those records of the authenticity and the verification of the 271 00:20:53,001 --> 00:21:00,000 content itself can also be preserved on a decentralized system. So that becomes 272 00:21:00,000 --> 00:21:07,000 basically the three parts of our system, capture, store, and verify. So as a last 273 00:21:07,000 --> 00:21:09,000 step, I want to show you where all this stuff is stored. 274 00:21:09,001 --> 00:21:14,001 In working with Adobe and Microsoft and the Linux Foundation, we've been helping 275 00:21:14,001 --> 00:21:18,001 pioneer a set of standards called the C2PA, which allow you to take all this 276 00:21:18,001 --> 00:21:24,000 information and actually put it directly, as an example, with a JPEG inside the 277 00:21:24,000 --> 00:21:28,001 photograph itself. So that now the photograph becomes a universe of information, 278 00:21:28,001 --> 00:21:33,001 not only of image pixel data, but also of these cryptographic proofs and also of 279 00:21:33,001 --> 00:21:38,001 these verifications. So now if, for instance, I, in this example, can use a small 280 00:21:38,001 --> 00:21:42,001 app, I can click on this eye and I can see all of the information around the 281 00:21:42,001 --> 00:21:47,000 photograph and its metadata. And I can also see the links back to where this 282 00:21:47,000 --> 00:21:50,001 information sits on the decentralized web. All of this just contained inside of 283 00:21:50,001 --> 00:21:54,001 the JPEG. So we think that's pretty nifty because it now changes every photograph 284 00:21:54,001 --> 00:21:59,001 from being just a photo, a container of image pixels, to now being a universe of 285 00:21:59,001 --> 00:22:06,000 information for fact checking, for image verification, etc, etc. All right, I'm 286 00:22:06,000 --> 00:22:09,001 going to close very quickly by just going through our prototype and some of our 287 00:22:09,001 --> 00:22:13,000 learnings. So I described you a little bit about the work that we did with 288 00:22:13,000 --> 00:22:17,001 Reuters, but it really was an unfolding set of experiments during the course of 289 00:22:17,001 --> 00:22:23,000 the 2020 election. And what we did is we used our technology to go out with 290 00:22:23,000 --> 00:22:26,001 photojournalists at Reuters, and I'll show you end to end how we established this 291 00:22:26,001 --> 00:22:31,001 new form of digital trust. So it began by having photos from this professional 292 00:22:31,001 --> 00:22:35,000 grade camera move over to the phone, where it was notarized through the process 293 00:22:35,000 --> 00:22:39,000 I've described, and then it ends up in the CMS system at the Reuters headquarters 294 00:22:39,000 --> 00:22:42,001 at their photo desk in London. And then from there, we took that information, 295 00:22:43,000 --> 00:22:48,000 which included again, not only the photo, but also things like location and a 296 00:22:48,000 --> 00:22:52,001 hash of the image, all that complex metadata, and we were able to syndicate it 297 00:22:52,001 --> 00:22:59,000 out to different decentralized systems. So in this case, the first step was to 298 00:22:59,000 --> 00:23:03,001 syndicate it out to a private permission system with IBM. So this is a form of a 299 00:23:03,001 --> 00:23:07,001 form of blockchain technology that's called a private permission ledger. So 300 00:23:07,001 --> 00:23:10,001 that's the first step. And then the second step was we put it on a public 301 00:23:10,001 --> 00:23:14,000 permissionless ledger, which is similar to something you can think of almost like 302 00:23:14,000 --> 00:23:18,001 Bitcoin, where we were able to store a hash of that information also out on the 303 00:23:18,001 --> 00:23:22,000 public web. So this allowed you to preserve privacy, and 304 00:23:22,000 --> 00:23:23,001 also have a public system of verification. 305 00:23:25,000 --> 00:23:27,000 I think we could have never imagined what we were actually going to 306 00:23:27,000 --> 00:23:29,000 capture during those 78 days. 307 00:23:29,001 --> 00:23:31,001 I think they caught all of us by surprise in terms of 308 00:23:31,001 --> 00:23:33,000 how historic they proved to be. 309 00:23:34,000 --> 00:23:39,000 But what I can say is that the efforts of our technology development were 310 00:23:39,000 --> 00:23:43,000 certainly on they weighed very heavily with us as we thought about what we were 311 00:23:43,000 --> 00:23:47,001 doing and helping think about the restoration of trust. Because surely I think we 312 00:23:47,001 --> 00:23:50,001 can all agree, no matter what side of the aisle we're on, that the demonization 313 00:23:50,001 --> 00:23:54,001 of the free press is something that we should all strive to end. 314 00:23:55,001 --> 00:23:59,001 And hopefully, the work that we're doing in creating an archive that can sustain 315 00:24:00,000 --> 00:24:05,000 the challenges of misinformation and the challenges of manipulation through 316 00:24:05,000 --> 00:24:09,001 social media is a really good step in that direction. And so you can check out 317 00:24:09,001 --> 00:24:14,001 the website yourself, starlinglab. org, 78 days, I'll give you a chance to play 318 00:24:14,001 --> 00:24:18,001 around with the archive and also see more in depth explanations of the 319 00:24:18,001 --> 00:24:25,001 technology. So I'll wrap up here with the last minute about our learnings. You 320 00:24:25,001 --> 00:24:29,001 matter. It's probably the biggest thing I can mention, which is that institutions 321 00:24:29,001 --> 00:24:33,001 like libraries and archivists are a key part of creating a solution that is 322 00:24:33,001 --> 00:24:39,000 networked, and that as a community, if we can all come together to guarantee the 323 00:24:39,000 --> 00:24:42,001 integrity of information, we're in a unique position to create a new foundation 324 00:24:42,001 --> 00:24:47,000 of digital trust. So it takes that form of collaboration, and that really when we 325 00:24:47,000 --> 00:24:51,000 think about decentralization, it's not a single destination, but it's an 326 00:24:51,000 --> 00:24:55,000 unfolding process in which we continually strive to bring more and more diverse 327 00:24:55,000 --> 00:25:00,001 nodes into our system. And the more diverse those nodes are, the more that 328 00:25:00,001 --> 00:25:04,001 they're going to be able to store and verify information. And so that's really 329 00:25:04,001 --> 00:25:08,001 why you might think of multiple ledgers and multiple decentralized systems coming 330 00:25:08,001 --> 00:25:14,000 into play, because they can allow for a tremendous amount of diversification of 331 00:25:14,000 --> 00:25:18,001 cryptographic features, of performance, methods of preservation, and last, of 332 00:25:18,001 --> 00:25:25,001 course, diversity use. Think of decentralization a lot like biodiversity. This is 333 00:25:25,001 --> 00:25:30,000 how we get resilience as a community, and both at a technical level and also at a 334 00:25:30,000 --> 00:25:33,000 community level. Right. With that, I'll pass it back 335 00:25:33,000 --> 00:25:34,001 to Wendy. Thanks so much for having me. 336 00:25:36,000 --> 00:25:40,000 Thank you, Jonathan. We have some questions, some really good questions. So one 337 00:25:40,000 --> 00:25:43,000 question is, how does this differ actually from 338 00:25:43,000 --> 00:25:45,001 BitTorrent, which is a very good question? 339 00:25:46,000 --> 00:25:49,001 There's a lot of similarities, actually. So BitTorrent works by syndicating 340 00:25:49,001 --> 00:25:54,000 information across multiple different nodes. Some of the big differences in our 341 00:25:54,000 --> 00:25:57,000 work is that we choose nodes. 342 00:25:57,000 --> 00:26:01,001 So whereas BitTorrent is meant to be diffuse and random with how information is 343 00:26:01,001 --> 00:26:06,000 spread across and it's optimized basically at the protocol level, we think about 344 00:26:06,000 --> 00:26:11,001 the decentralization process as something that we want archives to have a role in 345 00:26:11,001 --> 00:26:16,001 choosing which nodes they distribute their information. And so that's a major 346 00:26:16,001 --> 00:26:21,000 distinction. Is your tech open source? And can you point us 347 00:26:21,000 --> 00:26:27,001 to a open source technology? 348 00:26:28,000 --> 00:26:33,000 And our prototypes are we're in the process of putting out various parts of our 349 00:26:33,000 --> 00:26:37,000 code base, but really we haven't created any novel technology. We've just created 350 00:26:37,000 --> 00:26:42,000 novel implementation. So I'll be very happy to refer you over to our website. And 351 00:26:42,000 --> 00:26:46,000 if you want to reach out, I can give you a list of the different protocols that 352 00:26:46,000 --> 00:26:47,001 we've used. And all of those are open source. 353 00:26:48,000 --> 00:26:52,000 And we are very firmly committed to being a part of an open source ecosystem, 354 00:26:52,000 --> 00:26:57,000 both as contributors and also publishers. So Jonathan, what's the name of that 355 00:26:57,000 --> 00:26:59,001 JPEG embedded metadata standard? 356 00:27:00,000 --> 00:27:06,001 Librarians are very keen on helping to create better metadata. Sure. So the link 357 00:27:06,001 --> 00:27:11,000 is actually there. I see Heather's put it in, which is the C2PA. And I'd really, 358 00:27:12,000 --> 00:27:16,000 there's a very welcoming and open environment there for people to weigh in. I 359 00:27:16,000 --> 00:27:20,000 think archivists are a key part of helping us come up with a standard that's 360 00:27:20,000 --> 00:27:26,000 going to be useful for them. So we'd be really happy for people to contribute to 361 00:27:26,000 --> 00:27:29,001 that standard. It's based out of the Linux Foundation. So it too 362 00:27:29,001 --> 00:27:31,000 has open source commitments. 363 00:27:33,000 --> 00:27:40,000 So someone said, are you licensing software? As an organization, no. We're a lab 364 00:27:40,000 --> 00:27:45,001 that's experimenting to help create some of the art of the possible. And we have 365 00:27:45,001 --> 00:27:49,001 various partners that we work with. Almost all of them are fully 366 00:27:49,001 --> 00:27:51,000 transparent and open source in their work. 367 00:27:51,001 --> 00:27:55,001 That's a key criteria in working with them. And in that way, there's really no 368 00:27:55,001 --> 00:28:00,000 complexities with the licensing. You can use it, you can fork it, et cetera. 369 00:28:00,000 --> 00:28:04,001 Nicholas Taylor mentions that I had brought up the incentive and contracts as 370 00:28:04,001 --> 00:28:09,001 mechanisms for ensuring persistence. Can you elaborate on how persistence is 371 00:28:09,001 --> 00:28:16,000 assured or supported in the Starling frameworks? Sure. So remember, we're a 372 00:28:16,000 --> 00:28:18,000 framework that allows you to help make better choices. 373 00:28:18,000 --> 00:28:23,000 And we use a variety of different protocols. And each of those protocols are, we 374 00:28:23,000 --> 00:28:26,000 don't endorse them as best practices, but we're experimenting with them to 375 00:28:26,000 --> 00:28:31,000 understand how they could achieve persistence. I'd say that if you look at 376 00:28:31,000 --> 00:28:37,000 currently what's out there, I'd caution people that there are some big promises 377 00:28:37,000 --> 00:28:41,000 that are being made about immutability and persistence and permanence. 378 00:28:41,001 --> 00:28:46,001 We as a lab try to avoid those words, because we're concerned that with any of 379 00:28:46,001 --> 00:28:50,001 these technologies in the communities, history shows that really 380 00:28:50,001 --> 00:28:52,001 nothing can be guaranteed to be permitted. 381 00:28:53,001 --> 00:28:57,000 And so it really takes active efforts to ensure that type of thing. Now, what's 382 00:28:57,000 --> 00:29:01,001 new is that you really have these incentive layers that could potentially allow 383 00:29:01,001 --> 00:29:04,001 people to think about the creation of endowments, for instance, that could 384 00:29:04,001 --> 00:29:08,001 persist for years and years if they're really architected, and if the economics 385 00:29:08,001 --> 00:29:15,000 bear out. So in all the cases, whether it's Filecoin or Arweave, people from 386 00:29:15,000 --> 00:29:18,000 storage are here as well, they can talk to you about how you can use some of 387 00:29:18,000 --> 00:29:22,000 those incentives to help ensure that people that are hosting information are 388 00:29:22,000 --> 00:29:27,000 incentivized to do that long term. But the reality is that that's never a passive 389 00:29:27,000 --> 00:29:31,000 effort. The data owners and the archivists like you have to be involved in 390 00:29:31,000 --> 00:29:33,000 helping architect some of those best practices. 391 00:29:34,000 --> 00:29:37,000 And you shouldn't gloss over the details, because it's really important that 392 00:29:37,000 --> 00:29:41,001 everyone understand what are the incentive mechanisms and the security mechanisms 393 00:29:41,001 --> 00:29:47,001 there. We have some very knowledgeable questioners here. As Kiernan says, is 394 00:29:47,001 --> 00:29:52,000 someone more familiar with LTO storage and trusting the hashes and bag manifests? 395 00:29:52,000 --> 00:29:59,000 Is the idea here that these are not trustworthy enough in certain contexts? To be 396 00:29:59,000 --> 00:30:02,001 clear, I'm not as familiar with LTO storage, so you can help enlighten Kiernan. 397 00:30:03,000 --> 00:30:06,000 But what we found is that typically, most archiving 398 00:30:06,000 --> 00:30:08,000 organizations will just have hashes. 399 00:30:08,001 --> 00:30:13,001 They'll just store hashes like a SHA-256 of their underlying data. And that is 400 00:30:13,001 --> 00:30:18,001 not enough, because unless you sign that information, you really don't have a way 401 00:30:18,001 --> 00:30:22,000 of protecting those hashes and ensuring that they have integrity. So we're 402 00:30:22,000 --> 00:30:27,001 providing not only a hashing signing, but then also a way of putting that 403 00:30:27,001 --> 00:30:31,001 information on a decentralized ledger. So think about it as like the belt and 404 00:30:31,001 --> 00:30:35,000 suspenders in this case. But we're not taking anything for granted about the 405 00:30:35,000 --> 00:30:39,000 integrity of the hash. Instead, we are finding multiple layers of trust that we 406 00:30:39,000 --> 00:30:42,001 can put on top of the hash, so that we can all ensure that when we look back, 407 00:30:42,001 --> 00:30:46,001 let's say in 50 years, that we know that that hash was actually properly created, 408 00:30:46,001 --> 00:30:51,001 and it was secured over time. Jonathan, I don't know if you've ever thought of 409 00:30:51,001 --> 00:30:55,001 this, but you're speaking to a lot of people from memory institutions like 410 00:30:55,001 --> 00:31:01,000 libraries, museums. Looking in the future, where do you see decentralized storage 411 00:31:01,000 --> 00:31:03,001 applied in their world? 412 00:31:06,000 --> 00:31:09,001 I like to think about it when I talk to the folks at the Shoah Foundation who are 413 00:31:09,001 --> 00:31:14,000 on the archiving side. I like to put their mind at ease and say, I think this is 414 00:31:14,000 --> 00:31:19,000 a backup to the backup. And what I mean by that is the starting point is that 415 00:31:19,000 --> 00:31:23,000 this is really cold storage and it's diffuse. So that means it's going to take 416 00:31:23,000 --> 00:31:27,001 time to reconstitute these types of archives if we need to have a restore event. 417 00:31:28,000 --> 00:31:32,000 And that's okay, because actually that's a great form of resilience, is to think 418 00:31:32,000 --> 00:31:37,001 about how you can diversify organizations and geography. And if that takes a 419 00:31:37,001 --> 00:31:42,001 little bit longer, to get this backup of a backup back in your hands, I'd argue 420 00:31:42,001 --> 00:31:44,000 to you that that's still really valuable. 421 00:31:45,000 --> 00:31:49,000 Having been part of many technology organizations over the last 20 years, I can't 422 00:31:49,000 --> 00:31:52,000 tell you how many times we've been in a situation where we've trusted our vendor 423 00:31:52,000 --> 00:31:57,000 and trusted all the preparations we've made. And in the end, the server that was 424 00:31:57,000 --> 00:32:01,000 still standing was the one that was offline in the middle of nowhere that someone 425 00:32:01,000 --> 00:32:06,000 forgot even existed. Those are the types of things that can be essentially that 426 00:32:06,000 --> 00:32:09,000 type of serendipity is something you don't want to bank on. Instead, you want to 427 00:32:09,000 --> 00:32:14,001 actually think a little bit ahead. And these types of systems right now in their 428 00:32:14,001 --> 00:32:19,000 current state really can function in that way. They can be part of, I would say 429 00:32:19,000 --> 00:32:25,000 they're outside of your traditional and your performant forms of storage, but 430 00:32:25,000 --> 00:32:29,001 instead are a new way to think about preservation. And as these technologies get 431 00:32:29,001 --> 00:32:34,000 more mature, then we can start to move them up in our priority and reliability. 432 00:32:34,001 --> 00:32:37,001 Thanks so much for joining us and for the great work you're doing 433 00:32:37,001 --> 00:32:39,000 with so many different organizations. 434 00:32:40,000 --> 00:32:42,000 Likewise, Wendy, we're always inspired by you as 435 00:32:42,000 --> 00:32:44,000 well. Cheers. Thanks for having me. 436 00:32:44,000 --> 00:32:48,001 Okay, well, let's go on to see some demos. I mean, what Jonathan was talking 437 00:32:48,001 --> 00:32:54,001 about was cold storage, but what if you wanted active storage at scale? We're 438 00:32:54,001 --> 00:33:00,000 going to be showing you two projects that try to experiment with that. First, I'd 439 00:33:00,000 --> 00:33:04,001 like to introduce to you Arkady Kukarkin. He is one of the top D-Web engineers 440 00:33:04,001 --> 00:33:09,001 working today, and we are so honored and pleased that he works with us at the 441 00:33:09,001 --> 00:33:15,000 Arkiv. He was the founding CTO of an organization called Media Chain, which used 442 00:33:15,000 --> 00:33:20,001 blockchains to authenticate the provenance of music. And he also worked for 443 00:33:20,001 --> 00:33:26,001 Protocol Labs, which is the parent company of Filecoin. Now we gave Arkady this 444 00:33:26,001 --> 00:33:31,000 experiment to work on. Could you take a different type of data file, in this 445 00:33:31,000 --> 00:33:36,001 case, Warks or WebArchive files, and could you store them at scale across the 446 00:33:36,001 --> 00:33:41,001 Filecoin network? And we chose this collection, the End of Term Archive from 447 00:33:41,001 --> 00:33:47,000 2016. Now that was at the end of the Obama administration, the beginning of the 448 00:33:47,000 --> 00:33:52,000 Trump administration, and it gathered together the entire federal presence, every 449 00:33:52,000 --> 00:33:58,001 .gov and .mil website at that time. It was a collaborative collection, the 450 00:33:58,001 --> 00:34:02,001 Library of Congress, Stanford, California Digital Library, and many institutions 451 00:34:02,001 --> 00:34:07,001 worked together with the Internet Archive to pull this together. It's about 200 452 00:34:07,001 --> 00:34:11,001 terabytes large. Now if you were going to replicate it three times, that's 453 00:34:11,001 --> 00:34:13,000 600 terabytes you need. 454 00:34:13,001 --> 00:34:20,000 It's about 20,000 items, a million files, billions of individual URLs. So Arkady, 455 00:34:20,000 --> 00:34:27,000 can you show us how you've been doing? Hello. So my name is Arkady Kukarkin, and 456 00:34:27,000 --> 00:34:34,000 I'm going to show you how our experiment here is going so far. And let's just 457 00:34:34,000 --> 00:34:40,001 get started. So we use two technologies here primarily, IPFS and Filecoin. IPFS 458 00:34:40,001 --> 00:34:45,001 you can think of as a way to locate and retrieve content through a peer-to-peer 459 00:34:45,001 --> 00:34:51,000 network, and Filecoin you can think of as a way to ensure, or at least attempt to 460 00:34:51,000 --> 00:34:57,001 ensure, the long-term preservation of that content. So probably the best way to 461 00:34:57,001 --> 00:35:03,001 dive in is to just look at a simple example. So I have here 462 00:35:03,001 --> 00:35:09,000 IPFS enabled in my browser, it's engraved, but you can also install an extension 463 00:35:09,000 --> 00:35:12,000 to do this in any other browser as well. 464 00:35:13,000 --> 00:35:18,000 And we can take a look at my node here. So here's some stats, but the most 465 00:35:18,000 --> 00:35:23,000 interesting thing is probably the peer list, which may take a second to populate, 466 00:35:23,001 --> 00:35:30,001 but you can see I'm connected to almost 1400 peers throughout the world. And as 467 00:35:30,001 --> 00:35:37,000 they're coming up now, we actually see some in Russia and Ukraine as well, 468 00:35:37,001 --> 00:35:41,000 which is an interesting demonstration of the resiliency of these peer-to-peer 469 00:35:41,000 --> 00:35:46,000 connections, because as you know, web traffic to those places 470 00:35:46,000 --> 00:35:47,001 is currently disrupted. 471 00:35:48,001 --> 00:35:53,000 So let's take a look at just a simple image file here on the Metro website. 472 00:35:54,000 --> 00:35:59,001 We can import it into IPFS just like you could any normal file. 473 00:36:00,000 --> 00:36:01,001 Okay, here it is. 474 00:36:03,001 --> 00:36:10,000 And let's take a look. So IPFS, bam. So here's our 475 00:36:10,000 --> 00:36:15,001 image, and you can see sort of funny looking URL here at the top. Hopefully you 476 00:36:15,001 --> 00:36:22,000 can read that, but instead of HTTP, we have IPFS, and then we have this sort of 477 00:36:22,000 --> 00:36:28,000 scary looking long identifier. And what happened here is that the file was loaded 478 00:36:28,000 --> 00:36:34,000 into my local node and hashed and made available to the entire IPFS network. 479 00:36:35,000 --> 00:36:42,000 So if anyone, pretty much anywhere in the world, were to enter this IPFS URL, 480 00:36:42,001 --> 00:36:46,001 they would be able to access this file, maybe from my machine, maybe from another 481 00:36:46,001 --> 00:36:50,000 machine that also happens to have the same one, maybe from an intermediate node, 482 00:36:50,001 --> 00:36:56,000 someone in that network of 1400 machines that I've showed you. So I think this is 483 00:36:56,000 --> 00:37:03,000 already cool, because you're able to access a file simply by its 484 00:37:03,000 --> 00:37:08,000 identifier, the CID, that Jonathan mentioned already, without knowing or really 485 00:37:08,000 --> 00:37:14,000 caring where it came from. The reason that works is that the CID 486 00:37:14,000 --> 00:37:20,001 is actually, well, it's a little bit truncated here, but this 487 00:37:21,001 --> 00:37:26,001 long string is in fact an encoding of a content hash, which 488 00:37:26,001 --> 00:37:28,001 again, was mentioned by Jonathan. 489 00:37:29,000 --> 00:37:33,001 So we're not applying as a rigid of a standard here. So it's not a sign hash. But 490 00:37:33,001 --> 00:37:38,001 nonetheless, if you request this particular identifier, you are pretty much 491 00:37:38,001 --> 00:37:44,001 guaranteed to get the exact same file back. So I think that's already pretty 492 00:37:44,001 --> 00:37:51,001 cool, because if we think about something like the lifetime of hyperlinks 493 00:37:51,001 --> 00:37:57,001 in a research paper, so this is just the graphic I pulled down. So after just a 494 00:37:57,001 --> 00:38:04,001 few years, something close to 50% of all hyperlinks across academic papers are no 495 00:38:04,001 --> 00:38:09,000 longer resolvable. And maybe they exist elsewhere, let's say Internet Archive has 496 00:38:09,000 --> 00:38:13,000 archived a copy in the Wayback Machine, or someone else has a copy, but the 497 00:38:13,000 --> 00:38:20,000 actual link is broken, and needs to be manually fixed or followed, and for trust 498 00:38:20,000 --> 00:38:26,001 to be insured. So imagine the same paper using these references instead of a 499 00:38:26,001 --> 00:38:31,001 traditional URL, it will just work as long as another copy is available in the 500 00:38:31,001 --> 00:38:38,001 network. So let's move on to a real example. So the data set 501 00:38:38,001 --> 00:38:42,001 that Wendy mentioned is the end of term Web Archive, we're using the 2016 502 00:38:42,001 --> 00:38:49,000 version, which I think is probably a relatively hot set, as it were. 503 00:38:49,001 --> 00:38:56,001 And here's a copy that's just available on the web. And you can 504 00:38:57,001 --> 00:39:03,001 load a page here, so a little bit slow, but here we are, here's the Indianapolis 505 00:39:03,001 --> 00:39:10,001 FBI Bureau in fall of 2016. And here's 506 00:39:10,001 --> 00:39:17,000 what the backing data looks like. So this is just a whole lot of 507 00:39:17,000 --> 00:39:24,000 basically gigabyte sized work web archives. And so just as before, we 508 00:39:24,000 --> 00:39:30,000 have the CID identifier. And we can pull it up. 509 00:39:30,001 --> 00:39:37,001 And in fact, we can actually load it into some tools that have already added 510 00:39:37,001 --> 00:39:43,000 IPFS loading support secures, a replay web. page, which is actually just a static 511 00:39:43,000 --> 00:39:49,001 file that loads from IPFS itself as well, unless you browse the collection. 512 00:39:50,000 --> 00:39:57,000 So that's already pretty cool. So if you're a researcher, or an archivist, you 513 00:39:57,000 --> 00:40:02,001 may already de facto have a copy of this having accessed it. So we have lots of 514 00:40:02,001 --> 00:40:07,001 copy, they're keeping stuff safe. But is it safe enough? I think in this case, 515 00:40:07,001 --> 00:40:12,000 it's actually probably not the case, because this is important data, but it's a 516 00:40:12,000 --> 00:40:18,000 very large amount of data. And it's data that will probably sit around on not 517 00:40:18,000 --> 00:40:24,001 looked at that for the most part, until you're actually needed. So what do we do? 518 00:40:25,000 --> 00:40:31,001 Well, one solution is Filecoin. So we're using a tool called S-Cherry. S 519 00:40:31,001 --> 00:40:37,000 -Cherry is one of several clients with a Filecoin network. But what it does is 520 00:40:37,000 --> 00:40:43,000 essentially manage storage deals within Filecoin. With the Filecoin, 521 00:40:44,000 --> 00:40:46,000 basic primitive is a deal. 522 00:40:47,000 --> 00:40:51,001 And it is made between you as the clients and any number of storage providers. 523 00:40:52,000 --> 00:40:58,001 Here is a global map of the storage providers online currently. And 524 00:40:58,001 --> 00:41:02,001 at the end of the day, I care about where they're located. But because of the 525 00:41:02,001 --> 00:41:08,000 promises of the network and the protocol, I actually don't care who I'm talking 526 00:41:08,000 --> 00:41:13,001 to exactly, because the storage integrity is the protocol level primitive. So 527 00:41:13,001 --> 00:41:20,000 here we have a 3x replication across some files. And we can take a look 528 00:41:20,000 --> 00:41:27,000 here. So the bright green is fully online. And some of these others have 529 00:41:27,000 --> 00:41:33,000 actually shown storage faults. And the S-Cherry system has now gone ahead and 530 00:41:33,000 --> 00:41:38,001 recreated these additional replicas. So they're now in the process known as 531 00:41:38,001 --> 00:41:42,001 ceiling. And we can take a look. So here we have a provider. 532 00:41:45,000 --> 00:41:51,001 The provider, I don't actually know much about them, but we can take a look. Here 533 00:41:51,001 --> 00:41:58,000 they are. And this is a replica that we have in Montreal. So that's great. 534 00:41:59,000 --> 00:42:04,000 So I'd like to make a very quick note here, which is that coin might make you 535 00:42:04,000 --> 00:42:09,000 think of energy usage, of danger to the environment. And that is a 536 00:42:09,000 --> 00:42:11,000 very reasonable concern. 537 00:42:11,001 --> 00:42:15,000 So the important thing to realize with Filecoin is that it does not use the 538 00:42:15,000 --> 00:42:21,000 wasteful proof of work mechanism of Filecoin. The actual ongoing data 539 00:42:21,000 --> 00:42:25,001 verification that happens at the protocol level also ensures the integrity of the 540 00:42:25,001 --> 00:42:27,001 network. You can read more about it here at this link. 541 00:42:28,000 --> 00:42:34,001 And you can look at the volunteer energy disclosures at the Filecoin energy. So 542 00:42:34,001 --> 00:42:41,000 of course, there are many other systems that attempt to solve these 543 00:42:41,000 --> 00:42:47,000 problems as well. So there's IPFS cluster, which is a sort of collaborative 544 00:42:47,000 --> 00:42:53,001 backup solution. There's textile, which is a measure Filecoin client tool. 545 00:42:54,000 --> 00:43:00,001 There's storage, which will be right up next. There's Arweave, which aims to 546 00:43:00,001 --> 00:43:06,001 achieve a long term or potentially infinite storage with a finite upfront cost, 547 00:43:06,001 --> 00:43:13,001 which is 548 00:43:13,001 --> 00:43:19,000 an public bestiary note that's already hosting hundreds of terabytes of data 549 00:43:19,000 --> 00:43:20,001 for its users. 550 00:43:21,000 --> 00:43:28,000 And I think that's it. Thank you. Thank you so much, Arkady. You'll be hanging 551 00:43:28,000 --> 00:43:32,001 out with us later if people have more questions and you could probably answer 552 00:43:32,001 --> 00:43:35,001 some questions right there in the chat. And we're going to come back to questions 553 00:43:35,001 --> 00:43:41,001 with you and Dominic. So let's move on, though, to our second demonstration. I'd 554 00:43:41,001 --> 00:43:46,000 like to introduce you to Dominic Marino. He's the Senior Solutions Architect of 555 00:43:46,000 --> 00:43:51,000 Storage. Storage is probably the oldest decentralized storage company out there. 556 00:43:51,001 --> 00:43:56,000 And with storage, the Internet Archive has been working to store LibriVox 557 00:43:56,000 --> 00:43:57,001 audiobooks at scale. 558 00:43:58,000 --> 00:44:01,001 So here to show us how that work is going, please welcome Dominic Marino. 559 00:44:03,000 --> 00:44:07,001 Thank you so much, Wendy. Very excited to be here speaking with everyone today. 560 00:44:08,000 --> 00:44:13,000 I'm Dominic, a Solutions Architect of Storage, and we're one of the leading 561 00:44:13,000 --> 00:44:19,001 providers of decentralized storage. We're very proud of our track record over the 562 00:44:19,001 --> 00:44:25,001 last, oh, goodness, it's been about eight years since Sean Wilkinson founded us 563 00:44:25,001 --> 00:44:32,000 in 2014 in his dorm room. We've been really excited to work with the Internet 564 00:44:32,000 --> 00:44:38,001 Archive on decentralizing the LibriVox audiobook series. It's a collection of 565 00:44:38,001 --> 00:44:45,000 over 16,000 titles and approximately 22 terabytes of data. I've worked very 566 00:44:45,000 --> 00:44:52,000 closely with Arkady and have had a great time learning with him as we grow this 567 00:44:52,000 --> 00:44:57,001 at scale, bringing these massive collections into storage. And I'm happy to show 568 00:44:57,001 --> 00:45:01,001 what we've done today. The first thing I'm going to do is tell you what we've 569 00:45:01,001 --> 00:45:05,000 done, and then I'm going to show you how we did it, give you an explanation of 570 00:45:05,000 --> 00:45:11,001 how our network functions. So over at storage, we're a decentralized storage 571 00:45:11,001 --> 00:45:17,000 provider with over 13,000 nodes on our network of which over 9 572 00:45:17,000 --> 00:45:19,000 ,000 are independent node operators. 573 00:45:19,001 --> 00:45:26,000 When you upload a file into our ecosystem, you encrypt it, then you split it, and 574 00:45:26,000 --> 00:45:30,001 then you distribute it out to those tens of thousands of nodes. This gives you 575 00:45:30,001 --> 00:45:35,001 ultimately the consumer, the control, and allows you to remain if you choose to, 576 00:45:36,000 --> 00:45:37,001 the custodian of the private key. 577 00:45:38,000 --> 00:45:44,000 We do in full disclosure work in both the Web 2 and Web 3 space. So we're engaged 578 00:45:44,000 --> 00:45:50,000 on a daily basis in Web 3 related activity projects in this space, as well as 579 00:45:50,000 --> 00:45:55,001 offering edge services that allow organizations in the Web 2 space to benefit 580 00:45:55,001 --> 00:46:00,000 from the inherent benefits of the Web 3 space. 581 00:46:00,001 --> 00:46:05,000 Meaning you can have a product today that uses something like Amazon's S3 582 00:46:05,000 --> 00:46:10,000 storage, and you can benefit from the redundancy, the redundancy, the 583 00:46:10,000 --> 00:46:15,000 performance, the value that decentralized storage brings you still in the Web 2 584 00:46:15,000 --> 00:46:21,000 space. So we're really focused in pushing forward, in being forward leaning, but 585 00:46:21,000 --> 00:46:26,000 still being able to have a very usable service by all different sorts of orders. 586 00:46:27,000 --> 00:46:31,001 I'm going to jump right into a quick demo and show you some things we've 587 00:46:31,001 --> 00:46:38,001 accomplished, as well as a very simple way to use our product. And to 588 00:46:38,001 --> 00:46:44,000 do that, I'm going to go through and do a quick demo of uploading a file here. So 589 00:46:44,000 --> 00:46:49,001 the first thing I'm going to do is pop over into our product, go to our bucket. 590 00:46:50,000 --> 00:46:53,000 This is not the way you need to interact with our network, but it's a way you can 591 00:46:53,000 --> 00:46:58,000 interact with our network. So today, I'm just going to go into this bucket, and 592 00:46:58,000 --> 00:47:02,001 I'm going to put in a super secure passphrase. I'm going to understand that I 593 00:47:02,001 --> 00:47:05,001 need to remember that passphrase because I'm the custodian of it, and the service 594 00:47:05,001 --> 00:47:12,000 will not remember it. I'm now in the bucket, and I'm going to upload that 595 00:47:12,000 --> 00:47:19,000 file. When that file uploads, I'm then going to create a share link, paste 596 00:47:19,000 --> 00:47:25,001 that share link in, and view it. Now this is an edge service we're running that 597 00:47:25,001 --> 00:47:32,001 allows you to share out items to anyone you wish. And I'm just going to post the 598 00:47:32,001 --> 00:47:38,000 link so Heather can post that link for you. And you can load this link. But this 599 00:47:38,000 --> 00:47:43,000 is, and it's hard to see, on 80 different, there's 80 different pieces, so 80 600 00:47:43,000 --> 00:47:49,000 different notes. You can see the distribution around the pieces. And it's that 601 00:47:49,000 --> 00:47:53,001 easy. To show you what we've accomplished with the Internet Archive, I'm going to 602 00:47:53,001 --> 00:47:58,000 actually go through their main, the root of their site. I'm going to pop into the 603 00:47:58,000 --> 00:48:00,000 book collection, and then I'm going to go to their most 604 00:48:00,000 --> 00:48:02,000 popular book, The Art of War. 605 00:48:03,000 --> 00:48:09,000 The Art of War for all of us that haven't recently read it or are unfamiliar with 606 00:48:09,000 --> 00:48:14,000 it, is a book really about avoiding war, right? War is failure. This is about 607 00:48:14,000 --> 00:48:21,000 taking diplomatic ties to dispersing conflict. So with 608 00:48:21,000 --> 00:48:25,000 the Internet Archive, we've uploaded these 16,000 plus assets. 609 00:48:26,000 --> 00:48:33,000 And thanks to our Katie, you can see that all assets related to 610 00:48:33,000 --> 00:48:39,000 this asset are available over at storage, and she will be available at the 611 00:48:39,000 --> 00:48:44,000 Internet Archive. So you can see how they're using us. It's a programmatic 612 00:48:44,000 --> 00:48:48,001 interaction, so they're able to batch upload. You can see how easy it is to just 613 00:48:48,001 --> 00:48:55,000 use our simple web UI to go through and upload an object and share it. 614 00:48:55,000 --> 00:49:00,000 And that is backed by a decentralized network. I'm going to hop back to the 615 00:49:00,000 --> 00:49:06,000 presentation, and then hop over to the next slide, which is a summary of what 616 00:49:06,000 --> 00:49:10,000 we've accomplished, a summary that you can stream on the right here. You can see 617 00:49:10,000 --> 00:49:13,001 what it looks like. We've made a mock in the center, as well as the list of items 618 00:49:13,001 --> 00:49:18,000 on the right-hand side. And then I'm just going to jump to a final slide and 619 00:49:18,000 --> 00:49:22,001 cover a few more things about the network. So what we're really excited about at 620 00:49:22,001 --> 00:49:27,000 storage is that we're given the creative freedom to produce what we need to be 621 00:49:27,000 --> 00:49:32,000 successful, that is to build what people want. So when we're talking about things 622 00:49:32,000 --> 00:49:37,001 like IPFS, and Heather, I'm going to send you another link to share. This same 623 00:49:37,001 --> 00:49:44,000 image that we just uploaded has also been shared via an IPFS hash. I sent you 624 00:49:44,000 --> 00:49:50,001 a link should be embedded in the chat now, the ipfsdemo.dev.storage.io, showing 625 00:49:50,001 --> 00:49:57,000 that not only is our storage decentralized, but content are addressable as well. 626 00:49:57,000 --> 00:50:00,000 Now that's something that's not in production today, but it's coming very soon. 627 00:50:00,000 --> 00:50:06,000 It's just so fantastic to be in an org that provides so much opportunity 628 00:50:06,000 --> 00:50:07,001 to build great things for tomorrow. 629 00:50:08,000 --> 00:50:10,000 As far as a little bit more detail, I see a 630 00:50:10,000 --> 00:50:11,001 question on how does distribution occur. 631 00:50:13,000 --> 00:50:19,001 We had a PhD economist actually build the model, right? So all of the nodes on 632 00:50:19,001 --> 00:50:23,000 our network, we don't run those nodes, by the way, those are people that come in 633 00:50:23,000 --> 00:50:27,000 and choose to run them, are incentivized to be good actors on the network. We 634 00:50:27,000 --> 00:50:31,001 can't trust that they will be, of course. So we have an audit and repair process 635 00:50:31,001 --> 00:50:37,001 that continually runs. That audit and repair process means that if a node drops 636 00:50:37,001 --> 00:50:42,000 off the network, or a node is misbehaving in the network, or a node is simply 637 00:50:42,000 --> 00:50:49,000 just performing poorly, the power is out. We can address that. We will manage all 638 00:50:49,000 --> 00:50:55,001 repair. We manage all audit. There's no need to negotiate, for instance, and have 639 00:50:55,001 --> 00:51:01,000 maybe inconsistent pricing, you pay one price. And all of that is done behind a 640 00:51:01,000 --> 00:51:07,000 service level agreement, SLA, a contract where we guarantee a level of service to 641 00:51:07,000 --> 00:51:13,000 you. So we are a product today that you can use in your production application. 642 00:51:13,001 --> 00:51:17,001 You can get the benefits of that global distribution. 643 00:51:18,001 --> 00:51:23,001 If you want to be distributed, yet have data sovereignty, we do that as well. So 644 00:51:23,001 --> 00:51:29,000 if you, for instance, are trying to seek GDPR compliance, you want to be 645 00:51:29,000 --> 00:51:33,000 decentralized in Europe. You don't want the data anywhere else, but the European 646 00:51:33,000 --> 00:51:38,000 Union, no problem. Conversely, if you're doing that in the United States for a 647 00:51:38,000 --> 00:51:44,001 reason, or Canada, no problem. So we're really, today, the only provider giving 648 00:51:44,001 --> 00:51:50,001 you that decentralized storage solution with data sovereignty. Highly usable, 649 00:51:51,000 --> 00:51:57,000 decentralized storage with multiple on-ramps, making it easy, as you've seen, for 650 00:51:57,000 --> 00:52:02,001 the Internet Archive to decentralize that large catalog of audiobooks. 651 00:52:05,000 --> 00:52:12,000 It's truly wonderful and I'm very fortunate to be here. Wendy, with that, I'm 652 00:52:12,000 --> 00:52:15,000 going to wrap and we can take care of the rest, of course, in QA. 653 00:52:16,001 --> 00:52:22,001 Great. Thank you so much. Let's call Arkady and Dominic back and we'll stop 654 00:52:22,001 --> 00:52:29,000 sharing the screen and answer a few of your questions. So one of the questions 655 00:52:29,000 --> 00:52:35,001 is, how can you really prove back from that hash that the originator did not fake 656 00:52:35,001 --> 00:52:42,001 the location? I guess, how do we know that the hash is really 657 00:52:42,001 --> 00:52:49,001 trustworthy? As for John, doing this conversation, presentation, you're 658 00:52:49,001 --> 00:52:55,000 very much right to ask that. So the hash is only as trustworthy as the context of 659 00:52:55,000 --> 00:53:00,001 its creation. So obviously, we end up with a certain meaning to establish a root 660 00:53:00,001 --> 00:53:07,001 of trust. So one way that I can see this working out is if you can imagine the 661 00:53:07,001 --> 00:53:11,001 Internet Archive as a catalog and as a data store. 662 00:53:12,000 --> 00:53:14,000 So right now, you need both places. 663 00:53:18,001 --> 00:53:25,000 Archival bond, I'm actually not familiar with this term. So imagine the catalog 664 00:53:25,000 --> 00:53:32,000 and the data store as sort of separate ideas. So if you trust the catalog, you 665 00:53:32,000 --> 00:53:39,000 don't necessarily have to trust the data store as if they are linked through a 666 00:53:39,000 --> 00:53:44,000 cryptographically secure hash. So you can imagine, for example, a censorship 667 00:53:44,000 --> 00:53:49,000 resistance Internet Archive where you only need to ensure the integrity by 668 00:53:49,000 --> 00:53:54,001 transmission of the catalog portion and then the data can be retrieved from any 669 00:53:54,001 --> 00:54:00,001 number of decentralized networks underlying it and that relationship is trust. So 670 00:54:00,001 --> 00:54:04,001 here's a question about decentralized storage working with digital preservation. 671 00:54:05,000 --> 00:54:11,001 How does it handle, for instance, file obsolescence? File obsolescence. So let's 672 00:54:11,001 --> 00:54:15,001 dig a little bit deeper into that. That is the concept of a file not being 673 00:54:15,001 --> 00:54:19,001 necessary after a period of time. Is that how we want to think about it? Or are 674 00:54:19,001 --> 00:54:26,000 we thinking about the concept of maybe like bit rock? I would guess the first and 675 00:54:26,000 --> 00:54:32,001 maybe Dina can help us there. But let's say you don't need this file anymore. 676 00:54:32,001 --> 00:54:38,001 It's defunct. How easy is it to get rid of files, take them down? In other words, 677 00:54:39,001 --> 00:54:44,000 is it like Glacier where it's really hard to move things around? Or can you call 678 00:54:44,000 --> 00:54:46,001 and change things kind of at will? 679 00:54:47,000 --> 00:54:53,000 I can dive in on that. So at storage, we say it's hot storage for the price of 680 00:54:53,000 --> 00:54:59,000 cold. So you don't know, for instance, auto tiering or lower tier layer. We won't 681 00:54:59,000 --> 00:55:04,000 let you do a ratio code to non-ideal ratio. We just do it right. That being said, 682 00:55:04,001 --> 00:55:09,001 however you handle your file management, and this is true for all storage back 683 00:55:09,001 --> 00:55:16,001 ends, will be how you manage the archival and potential deletion of assets 684 00:55:16,001 --> 00:55:21,001 at a period of time. The storage back ends generally wouldn't be responsible for 685 00:55:21,001 --> 00:55:28,001 that. It is managed in that archive. Well, with that, I think we are going to 686 00:55:28,001 --> 00:55:34,001 wrap up this session and ask everyone to join us at the next session for 687 00:55:34,001 --> 00:55:41,000 more to and fro. We hope you did enjoy what you heard today. And it was just a 688 00:55:41,000 --> 00:55:46,001 taste, a beginning. So please come back for more. We're doing this for six 689 00:55:46,001 --> 00:55:51,000 months. The last Thursday of every month, this is number two at 1pm 690 00:55:51,000 --> 00:55:53,001 Pacific, 4pm Eastern. 691 00:55:53,001 --> 00:55:57,000 We have four more sessions here, three of them. The next one in March 692 00:55:57,000 --> 00:55:58,001 is on decentralized identity. 693 00:55:59,001 --> 00:56:05,000 And we've also, as mentioned, developed this really beautiful resource guide with 694 00:56:05,000 --> 00:56:10,001 different videos, links to other companies that do this, other 695 00:56:10,001 --> 00:56:12,000 organizations, deeper dive reading. 696 00:56:13,000 --> 00:56:17,001 So please take a look. We're dropping the link to that in the chat. It will be 697 00:56:17,001 --> 00:56:22,000 emailed to you if you registered for this. And please share it widely. That's why 698 00:56:22,000 --> 00:56:24,001 it's there. Finally, I just want to say thank you.